Alvin Chua, KU Leuven, alvin.chua@asro.kuleuven.be PRIMARY
Ryo Sakai, KU Leuven, ryo.sakai@esat.kuleuven.be
Jan Aerts, KU Leuven, jan.aerts@esat.kuleuven.be
Andrew Vande Moere,
KU Leuven, andrew.vandemoere@asro.kuleuven.be
Student Team: Yes
Our procedure consists of three
stages: (1) Aggregate & Slice, (2) Design, Filter & Analyze and (3)
Communicate. Stage 1 is concerned with rapidly discovering insights and make
use of openly available software to test hypothesis. Stage 2 involves the
design and implementation of streamlined tools to optimize the identification
of specific patterns. Finally, we present our discoveries in stage 3 with
simplified abstractions so that they can be easily understood.
Stage 1: R, QGIS
Stage 2: A series of
visualization tools developed in Processing by Ryo Sakai and Alvin Chua at DataVisLab, KU Leuven consisting of two interactive and a
static visualization. Our first interactive visualization links parallel
coordinates to a map while our second links a timeline to histograms and an
origin-destination (OD) map. Finally, our static visualization represents
information displayed on a timeline as a flow diagram to optimize visual search.
Stage 3: Graphvis
Approximately
how many hours were spent working on this submission in total?
96 hours
May we post
your submission in the Visual Analytics Benchmark Repository after VAST
Challenge 2014 is complete? Yes
Video:
https://www.dropbox.com/s/ejiwwncza6xgzy2/KUL-Chua-MC2.mp4
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 – Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like? Please limit your response to no more than five images and 300 words.
According
to the three stage process described above, R was used to quickly visualize the
GPS data. Fig.1a shows a relationship between employment type and spatial
activity. Fig.1b indicates that the spatial activity on weekdays posses a
routine pattern in comparison to the more volatile movements on weekends. A key
discovery suggests that engineers, executives and IT staff tend to move across
the map in a distinctively different pattern in comparison to security and
facilities. We further processed the data so that movements can be represented
as an OD map where the average daily pattern can be expressed as a series of
movements between locations. Locations for the OD map are detected based on the
time intervals in the GPS data where a car is stationary for an amount of time
(fig.2a). Detected locations are then clustered by proximity and validated with
the points of interest (POI) on the tourist map (fig.2b). Fig.3 shows our OD
map where edges are weighted and filtered according to the combination of the
(1) the number of employees moving from one cluster to another and (2) the
number of days where the connection between two clusters exists. Maps
are generated for each employment group based on the average number of
employees and days that the employment type has been active. Finally, we present the average daily
routine of GAStech employees as directed graphs. Each
node represents a cluster and edges are used to encode movement between clusteres. Edge labels indicate the median hour of day
where movement occurs. We observe that GAStech is the
power centre in the average daily routine suggesting that it serves as a major
hub where majority of the transitions between activities occur. All employment
types apart from facilities share similar lunch hours and tend to visit common
locations.
a) Movement Patterns by Employment
Type
b) Movement Patterns by Employment
Type & Day of Week
Fig.1 Comparison of movement patterns by employment types.
The NA type comprises of GPS data from trucks that do not have driver
information. a) Aggregation reveals how movement differs between employment
types and show regions of the map that are more frequently traversed. b) Small
multiples of the movement patterns by day of week revealing routine patterns on
weekdays and show more volatile movements on weekends.
Fig.2. Locations are detected based on time intervals in the GPS data and validation with the provided tourist map. a) Locations on the map distilled from GPS data where a car is stationary for more than 70 seconds. The locations are then clustered in 10m radius and overlaid on the tourist map in b).
Engineering
Security
Executives
Information
Technology
Facilities
Fig.3. Detected locations in (Fig.2.) are used as waypoints to generate an OD map. Maps on the left visualize the complete movement records from the GPS data while maps on the right show the routine movements between clusters.
Engineering
Security
Executives
Information
Technology
Facilities
Fig.4. The daily routine of GAStech employees presented as a directed graph. Each node represents a location and implies an activity undertaken in that locale. Edges are used to encode the transition between locations. Edge labels indicate the median hour of day where the transition occurs.
MC2.2 – Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe
a. What is the pattern or event you observe?
b. Who is involved?
c. What locations are involved?
d. When does the pattern or event take place?
e. Why is this pattern or event significant?
f. What is your level of confidence about this pattern or event? Why?
Please limit your answer to no more than twelve images and 1500 words.
Possible Credit Card
Fraud
A comparison between transaction values occurring in the credit card and loyalty card data reveals a high degree of mismatches. Fig.5a. shows the transaction mismatches that GAStech employees encounter. The bottom five individuals in the plot case do not posses transaction data from either or both credit card and loyalty card datasets. Fig.5b. compares the number of mismatches to legitimate transactions occurring at each location. We observe that 26 of 34 locations are subjected to mismatching transactions and the top five locations hosting the highest number of mismatches are (1) Hippokampos, (2) Brew’ve Been Served, (3) Guy’s Gyros, (4) Abila Zachoros and (5) Hallowed Grounds. While the distribution of mismatches are proportional to the total number of transactions in each location, Katarina’s Cafe remains free of mismatches despite hosting the largest number of transactions. While these locations are routinely visited by GAStech employees, (fig.4), the lack of a distinct pattern seem to suggest that there isn’t a targeted attack on a specific group of individuals. Temporal analysis of the data (fig.6a) shows that the mismatches peak four times a day and is most severe at exactly 12pm. Fig.6b. indicates that the majority of the mismatches occur at either $20, $40 $60 or $80. The combination of discrete mismatching values and time of day appears to be unusual, suggesting that a systematic process is involved. While it appears to be unusual and hint at a possible case of credit card fraud, this discovery may not be directly related to the missing GAStech employees.
Mismatch Between Credit Card and Loyalty Card Transaction
a) Number of Mismatches
by Nationality
b) Number of Mismatches
by Location
Fig.5. Bar plots illustrating the number of transaction mismatches between credit card and loyalty card by the (a) nationality of employees and (b) location.
Mismatch Between Credit Card and Loyalty Card Transaction
a) Number of Mismatches by Value
b) Number of Mismatches by Time of Day
Fig.6. Comparison between the number of transaction mismatches to: (a) the value of each mismatch and (b) the time where mismatches occur.
Unique Gatherings
Gatherings may be inferred when employees are spatially and temporally collocated in the same cluster. The accuracy of our inference model is determined by clustering proximity described in MC2.1. The flow diagram in fig.3 illustrates gatherings that we detected. Each spline in the diagram represents an employee and is color-coded to reflect employment type. Each box represents a gathering and is number to correspond with a cluster on the tourist map in fig.2b. The width of each box is encodes the duration of a gathering while height is used to represent the number of people involved. We detect two unique gatherings that involve only Tethyians lasting for more than 3 hours. These events do not have any instance of recurrence in the data. The highlighted box in fig.3a. indicates a large gathering at Canero street involving 14 GAStech employees from Engineering and IT. It is likely to have taken place at Felix Balas' residence. The highlighted box in Fig.3b. shows the executive team gathering at Desafio Golf Course on a Sunday. The event involves GAStech CEO Sten Sanjorge Jr. who would have been a likely target for the Protectors of Kronos. Law enforcement should investigate if any of the participants in notice any suspicious individuals or activities in the vicinity.
Detecting Unique Gatherings Based on Spatial Temporal
Colocation
a) Large Gathering at Carnero Street
b) Executives Gathering at Desafio Golf Course
Fig.7. Flow diagram illustrating two instances where unique gatherings occurred. Each spline in the diagram represents an employee and is color-coded to reflect employment type. Each box represents a gathering and is number to correspond with a cluster on the tourist map in fig.2b. The width of each box is encodes the duration of a gathering while height is used to represent the number of people involved.
Suspicious Behavior
by Kronesian GAStech
Security Employees
We detect two clusters visited only by Kronesian employees from security with military experience on four occasions. Loreto Bodrogi, Minke Mies, Inga Ferro and Hennie Osvaldo are the GAStech employees who visit these clusters between 11am to 1pm. Visits to this cluster in the weekend suggest that this activity is not related to work. Connectivity analysis of the road network indicates that both clusters are situated in less accessible localities in the industrial region of the map (fig.7a). Well-connected zones tend to have more traffic while less connected regions tend to be quieter. The lack of any transaction data and POI reference on the tourist map stimulates further suspicion.
Detecting Suspicious Locations and Employees
a) Locations of Suspicious Clusters
b) Suspicious Employees That Visited Both Clusters
Fig.8. Discovery of two suspicious clusters (a) and the Kronesian GAStech security employees (b) who visited both locations on four occasions. Saturation is employed to encode connectedness in (a).
Unusual Amount of
Truck Movement
We discover an unusually high amount of truck movement on 16/01/14. Fig.9. shows a sharp increase in the amount of waypoints generated by the onboard GPS device of the trucks. These waypoints were also generated much later than the daily average. The lack of anomalies in the delivery routes suggest that the drivers were either driving slower and/or making more trips than their daily average.
Fig.9. Histogram reveals an unusually high amount of truck movements after 17:00 on 16/01/14.
Large transaction at Frydo’s Autosupply
An analysis of the transactions per employee reveals that Lucas Alcazar spent a large sum of money at Frydo's autosupply on the 13/01/14. Fig.10 shows a comparison of the transaction value that take place at various locations. Larger values on the right of the plot belong to employees who operate trucks that shuttle between GAStech and various industrial locations. The locations where these transactions occur suggest work related activities. The highlight transaction indicates a distinct outlier. Though there are no records of what the transaction entails, the sum suggest that Alcazar might have purchased a vehicle. A key question raised by this conjuncture points to why he did so when he already has a car assigned?
Fig.10. Boxplot comparing the transaction values that occur at various locations. The highlighted transaction is a distinct outlier that should be investigated.
Unusual Transactions
at Kronosmart
An unusual transaction by Lucas Alcazhar and Ada Campo-Corrente at Kronosmart on 19/01/14 was discovered to take place without any supporting trace of vehicle movement. The transaction occurring at 4am on a Sunday morning suggests that both employees could have arranged to visit Kronosmart together and made their way there without using their company provided vehicle. The lack of GPS data further suggests that the visit could have been a deliberate attempt to be discrete. It is possible that they used the new car Lucas purchased on the 13/01/14.
Fig.11. Timeline visualization shows employee activity over the course of one day. A thin horizontal grey line indicates that an employee is stationary while a thick horizontal grey line indicates that a transaction was made within the cluster where he/she was stationary. A thin vertical line in magenta indicates the transaction time. The highlighted region of the timeline illustrates examples where a transaction was made without supporting GPS information.
MC2.3 – Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data. Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2. Please limit your response to no more than five images and 300 words.
Our analysis assumes that movement across the map involves a car and that employees only commute with the company supplied vehicle. We’ve learnt that this is not always true as employees may choose other forms of transport or simply walk as Abila is approximately 9.68km x 5.37km. The second assumption concerns location detection (fig 2). We assumed that either credit card transaction or loyalty card records can account for the activities taking place within a given cluster but employees do not necessarily have to make a transaction at every venue she visits (e.g. online purchases) thus not all activities can be accounted for. Finally, we employ an automatic process to assign the employees from facilities to their respective trucks. The algorithm assumes that each employee will be assigned to a truck for a day and does not consider the possibility of switching trucks in the middle of the day.